Data-driven Segment Pres Trainable Speech Syn
نویسنده
چکیده
Unit selection based concatenative speech synthesis has proven to be a successful method of producing high quality speech output. However, in order to produce high quality speech, large speech databases are required. For some applications, this is not practical due to the complexity of the database search process and the storage requirements of such databases. In this paper, we propose a data-driven algorithm to reduce the database size used in concatenative synthesis. The algorithm preselects database speech segments based on statistics collected by synthesizing a large number of sentences using the full speech database. The algorithm is applied to the IBM trainable speech synthesis system and the results show that database size can be reduced substantially while maintaining the output speech quality.
منابع مشابه
Reducing the footprint of the IBM trainable speech synthesis system
This paper presents a novel approach for concatenative speech synthesis. This approach enables reduction of the dataset size of a concatenative text-to-speech system, namely the IBM trainable speech synthesis system, by more than an order of magnitude. A spectral acoustic feature based speech representation is used for computing a cost function during segment selection as well as for speech gen...
متن کاملSFC: A trainable prosodic model
This paper introduces a new model-constrained and data-driven system to generate prosody from metalinguistic information. This system considers the prosodic continuum as the superposition of multiple elementary overlapping multiparametric contours. These contours encode specific metalinguistic functions associated with various discourse units. We describe the phonological model underlying the s...
متن کاملPhrase splicing and variable substitution using the IBM trainable speech synthesis system
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speechproduction lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone se...
متن کاملThe IBM trainable speech synthesis system
The speech synthesis system described in this paper uses a set of speaker-dependent decision-tree state-clustered hidden Markov models to automatically generate a leaf level segmentation of a large single-speaker continuous-read-speech database. During synthesis, the phone sequence to be synthesised is converted to an acoustic leaf sequence by descending the HMM decision trees. Duration, energy...
متن کاملTrainable speaker diarization
This paper presents a novel framework for speaker diarization. We explicitly model intra-speaker inter-segment variability using a speaker-labeled training corpus and use this modeling to assess the speaker similarity between speech segments. Modeling is done by embedding segments into a segment-space using kernel-PCA, followed by explicit modeling of speaker variability in the segment-space. O...
متن کامل